#fix 35 by freelw · Pull Request #41 · freelw/cpp-transformer

freelw · 2025-06-17T04:28:35Z

#fix 35

freelw · 2025-06-17T04:28:47Z

./lm
corpus : ./resources/time_machine/timemachine_preprocessed.txt
epochs : 10
batch_size : 16
dropout : 0.2
gpu : 1
learning rate : 0.001
checkpoint :
max_words_cnt : 256
token_ids_size : 256
Allocating memory
for tensors : 36609236 bytes,
for c_tensors: 3194706336 bytes
for grad_tensors: 1241779004 bytes
epoch 0 : [192/224]loss : 5.84092
epoch 1 : [192/224]loss : 2.01121
epoch 2 : [32/224]loss : 0.76783
checkpoint saved : ./checkpoints/checkpoint_20250617_122712_3.bin
@dratman

freelw · 2025-06-17T04:36:59Z

The issue has been fixed. The cause was that an int was incorrectly used instead of an unsigned int when calculating the offset of the Metal buffer. I readjusted the default value of batch_size to 16, and in this case, the program can run correctly, but it triggers swap on my MacBook, making the machine run a bit slowly. Therefore, it's generally recommended to reduce it, such as using the -b 8 configuration. Additionally, batch_size can be set slightly larger, but not by much, as it doesn't support video memory exceeding the range of a uint. You can try the latest code from the main branch and check if the loss decreases with the default parameters on your MacBook. @dratman

freelw · 2025-06-17T04:44:03Z

fix #35

freelw · 2025-06-17T06:46:52Z

(base) ➜ cpp-transformer git:(main) ./lm
corpus : ./resources/time_machine/timemachine_preprocessed.txt
epochs : 10
batch_size : 16
dropout : 0.2
gpu : 1
learning rate : 0.001
checkpoint :
max_words_cnt : 256
token_ids_size : 256
Allocating memory
for tensors : 36609236 bytes,
for c_tensors: 3194706336 bytes
for grad_tensors: 1241779004 bytes
epoch 0 : [192/224]loss : 6.02557
epoch 1 : [192/224]loss : 2.02572
epoch 2 : [192/224]loss : 0.380724
epoch 3 : [192/224]loss : 0.0781031
epoch 4 : [192/224]loss : 0.0326921
epoch 5 : [192/224]loss : 0.0226658
epoch 6 : [192/224]loss : 0.019125
epoch 7 : [192/224]loss : 0.0176224
epoch 8 : [192/224]loss : 0.0167923
epoch 9 : [192/224]loss : 0.0157845
checkpoint saved : ./checkpoints/checkpoint_20250617_125929_9.bin

dratman · 2025-06-17T09:50:14Z

Fortunately my MacBook has 64 GByte shared RAM so not likely to be a problem. Will continue testing in the morning.

freelw · 2025-06-17T12:52:58Z

Fortunately my MacBook has 64 GByte shared RAM so not likely to be a problem. Will continue testing in the morning.

I'm so envious of you. By the way, should I support video memory exceeding 4GB? Hahahaha

dratman · 2025-06-18T02:48:20Z

Dear freelw... (I'd like to know your Chinese name, or English name if you prefer, so that I can address you properly) Don't be too envious of the equipment I'm fortunate to have -- you see, my working time was in the 1980s. I am 73 years old, and my mind works much more slowly now. You have youth, which is the best gift in the whole world. Enjoy it! But back to work... Here is my latest run, where the only difference from pulling your latest version is (as you will see below) I have added some sentences to test_lm.txt:

…

--->time ./lm -e 5 corpus : ./resources/time_machine/timemachine_preprocessed.txt epochs : 5 batch_size : 16 dropout : 0.2 gpu : 1 learning rate : 0.001 checkpoint : max_words_cnt : 256 token_ids_size : 256 Allocating memory for tensors : 36609236 bytes, for c_tensors: 3194706336 bytes for grad_tensors: 1241779004 bytes epoch 0 : [192/224]loss : 5.99219 epoch 1 : [192/224]loss : 2.0627 epoch 2 : [192/224]loss : 0.410259 epoch 3 : [192/224]loss : 0.0856831 epoch 4 : [192/224]loss : 0.0341517 checkpoint saved : ./checkpoints/checkpoint_20250617_223325_4.bin real 0m57.148s user 0m19.555s sys 0m6.291s Tue Jun 17 22:33:26 EDT 2025

--->time ./lm -e 0 -c ./checkpoints/checkpoint_20250617_223325_4.bin corpus : ./resources/time_machine/timemachine_preprocessed.txt epochs : 0 batch_size : 16 dropout : 0.2 gpu : 1 learning rate : 0.001 checkpoint : ./checkpoints/checkpoint_20250617_223325_4.bin max_words_cnt : 256 token_ids_size : 256 Allocating memory for tensors : 36355416 bytes, for c_tensors: 17206908 bytes for grad_tensors: 14209596 bytes loading from checkpoint : ./checkpoints/checkpoint_20250617_223325_4.bin loaded from checkpoint serving mode test file : ./test_lm.txt sentence : crystalline substance and now i must be explicit convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be convenient to speak ----------------- sentence : a small shaded lamp the bright light of which was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be convenient to speak of him was expounding a ----------------- sentence : candlesticks wells i the time traveller for so it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the ----------------- sentence : expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be ----------------- sentence : the time machine by h g wells i the time traveller for so it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned ----------------- sentence : the fire burned time traveller for so it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for ----------------- sentence : between that thursday it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be ----------------- sentence : the blowing out of the candle him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be convenient to speak of him was expounding ----------------- sentence : between that thursday it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be ----------------- sentence : a small shaded lamp the bright light of which was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be convenient to speak of him was expounding a ----------------- sentence : candlesticks wells i the time traveller for so it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the ----------------- sentence : expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be ----------------- sentence : the time machine by h g wells i the time traveller for so it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned ----------------- sentence : the fire burned time traveller for so it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for ----------------- sentence : the blowing out of the candle him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be convenient to speak of him was expounding ----------------- sentence : crystalline substance and now i must be explicit convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly and animated the time traveller for so it will be convenient to speak ----------------- real 0m2.630s user 0m0.395s sys 0m0.074s Tue Jun 17 22:33:53 EDT 2025

----------------- As you can see the processing is running very fast, and everything seems to work correctly, except that the generated output does not seem to vary much depending on the prompt (or "sentence"). I am continuing to investigate that. Your work is excellent! Ralph

On Tue, Jun 17, 2025 at 8:53 AM freelw ***@***.***> wrote: *freelw* left a comment (freelw/cpp-transformer#41) <#41 (comment)> Fortunately my MacBook has 64 GByte shared RAM so not likely to be a problem. Will continue testing in the morning. I'm so envious of you. By the way, should I support video memory exceeding 4GB? Hahahaha — Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAWZPWHKOVOPVQDSD5YMJD3EAFT7AVCNFSM6AAAAAB7PA7KO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOBQGI3DSNRQGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

freelw · 2025-06-18T03:31:30Z

@dratman
I am amazed that you still maintain a passion for learning at such an age, which is truly admirable!

First of all, I wish you good health.

May I directly contact you via the email published on your GitHub? My email is freelw81@qq.com.

Regarding the issue mentioned above: The output did not change significantly with the modification of the prompt. I think there might be two reasons:

The training data we used only contains 256 words, which may cause the model's output to lack diversity. We can use the parameter -m 10000000 to let the program train with the complete Time Machine data. On my machine, one epoch takes 4 hours.
The positional encoding I used is absolute positional encoding, which should differ from the standard GPT-2. I am still learning about this field and plan to try other positional encodings in future versions.

dratman · 2025-06-18T05:34:44Z

Of course, feel free to send email to my regular address. --- By "the full Time Machine data" you mean the 178 KByte novel in the file timemachine.txt? I assumed I was already training with that. To train with the whole novel, I just add "-m 10000000" to the training command? 4 hours per epoch is no problem. I can easily let it run overnight or even for several days. About my continued interest in this field: I spent two years as an undergraduate at UC Berkeley in 1969-1971, concentrating on physics and math. Later I found a career in both logic design and software development. Many years and events went by. My wife and I raised two children but lost one to a drug overdose. Over time I began to feel old, and my ability to learn new technical material was actually declining until about 2022, when I first found out about the astonishing architecture of the GPT-type language models. The high-dimensional vectors I read about in connection with GPT-3, successively modified through dozens of processing layers, were unlike anything I could have imagined as a way of representing word, sub-word or character tokens. The idea of changing an integer representing an English letter, word, or a Chinese character into a thousand-dimensional vector of floating-point numbers seemed to defy common sense. I was immediately determined to understand what was going on. I started reading extensively and watching youtube videos, and gradually -- to my surprise -- some of my technical acumen returned. I began playing with Andrej Karpathy's makemore and similar small models. I am still trying to fully grasp how it is possible that this bizarre system of vectors and weights can understand what I write, and then reply in ways that often help me understand some topic more quickly than by the old methods of study.

…

On Tue, Jun 17, 2025 at 11:31 PM freelw ***@***.***> wrote: *freelw* left a comment (freelw/cpp-transformer#41) <#41 (comment)> @dratman <https://github.com/dratman> I am amazed that you still maintain a passion for learning at such an age, which is truly admirable! First of all, I wish you good health. May I directly contact you via the email published on your GitHub? My email is ***@***.*** Regarding the issue mentioned above: The output did not change significantly with the modification of the prompt. I think there might be two reasons: 1. The training data we used only contains 256 words, which may cause the model's output to lack diversity. We can use the parameter -m 10000000 to let the program train with the complete Time Machine data. On my machine, one epoch takes 4 hours. 2. The positional encoding I used is absolute positional encoding, which should differ from the standard GPT-2. I am still learning about this field and plan to try other positional encodings in future versions. — Reply to this email directly, view it on GitHub <#41 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAWZPVWBQHRMVQ3F5OPHZL3EDMSPAVCNFSM6AAAAAB7PA7KO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOBSGU2DGMZQHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

freelw · 2025-06-19T01:22:32Z

Of course, feel free to send email to my regular address. --- By "the full
Time Machine data" you mean the 178 KByte novel in the file
timemachine.txt? I assumed I was already training with that. To train with
the whole novel, I just add "-m 10000000" to the training command? 4 hours
per epoch is no problem. I can easily let it run overnight or even for
several days.

If you add a parameter like -m 10000000, the output should contain the wording

./lm -m 100000000
corpus : ./resources/time_machine/timemachine_preprocessed.txt
epochs : 10
batch_size : 16
dropout : 0.2
gpu : 1
learning rate : 0.001
checkpoint : 
max_words_cnt : 100000000
token_ids_size : 32775
Allocating memory  
for tensors : 36609236 bytes, 
for c_tensors: 3194706336 bytes 
for grad_tensors: 1241779004 bytes
epoch 0 :  [80/32743]loss : 7.45601

What you need to note is that the denominator of the "epoch" line is 32743, and the token_ids_size is 32775.
Only when such wording is seen can it indicate that the full-text of the novel was used for training.
@dratman

freelw · 2025-06-19T01:37:56Z

You're really amazing! My project was also inspired by two projects of Andrej Karpathy, one is llm.c, and the other is micrograd.I see you've already paid attention to llm.c. It's indeed a remarkable project, but I don't think it's particularly suitable for understanding deep learning. I highly recommend micrograd – its implementation is simple. If you have basic concepts about the backpropagation mechanism, this project might suddenly make things click for you, just like it did for me. It's extremely inspiring: simple to implement yet perfect for learning.

About my continued interest in this field: I spent two years as an
undergraduate at UC Berkeley in 1969-1971, concentrating on physics and
math. Later I found a career in both logic design and software development.
Many years and events went by. My wife and I raised two children but lost
one to a drug overdose. Over time I began to feel old, and my ability to
learn new technical material was actually declining until about 2022, when
I first found out about the astonishing architecture of the GPT-type
language models. The high-dimensional vectors I read about in connection
with GPT-3, successively modified through dozens of processing layers, were
unlike anything I could have imagined as a way of representing word,
sub-word or character tokens. The idea of changing an integer representing
an English letter, word, or a Chinese character into a
thousand-dimensional vector of floating-point numbers seemed to defy common
sense. I was immediately determined to understand what was going on. I
started reading extensively and watching youtube videos, and gradually --
to my surprise -- some of my technical acumen returned. I began playing
with Andrej Karpathy's makemore and similar small models.

I am still trying to fully grasp how it is possible that this bizarre
system of vectors and weights can understand what I write, and then reply
in ways that often help me understand some topic more quickly than by the
old methods of study.

freelw added 2 commits June 17, 2025 12:21

fix yes

99c54eb

fix

dcc55d4

freelw requested a review from dratman June 17, 2025 04:28

freelw merged commit eac9698 into main Jun 17, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#fix 35#41

#fix 35#41
freelw merged 2 commits intomainfrom
wangli_dev_20250617_2

freelw commented Jun 17, 2025

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

dratman commented Jun 17, 2025

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

dratman commented Jun 18, 2025 via email

Uh oh!

freelw commented Jun 18, 2025

Uh oh!

dratman commented Jun 18, 2025 via email

Uh oh!

freelw commented Jun 19, 2025

Uh oh!

freelw commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

freelw commented Jun 17, 2025

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

dratman commented Jun 17, 2025

Uh oh!

freelw commented Jun 17, 2025

Uh oh!

dratman commented Jun 18, 2025 via email

Uh oh!

freelw commented Jun 18, 2025

Uh oh!

dratman commented Jun 18, 2025 via email

Uh oh!

freelw commented Jun 19, 2025

Uh oh!

freelw commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants